Closing the DBTL loop with TeselaGen¶

With TeselaGen's platform you can close the Design-Build-Test-Learn (DBTL) cycle using machine learning algorithms that automatically learn from your data. The DISCOVER module is capable of suggesting new candidates that can optimize your results given your previous experimental rounds. This document shows how to enable those candidates as new designs at the DESIGN module to perform the next DBTL cycle.

Inputs: Evolutions algorithm's result at an DISCOVER module instance

Outputs: New designs created at DESIGN module

Requirements:¶

Access permissions to the lab where the evolutions results are stored
Have Python3 installed in your local computer with Pandas and TG's api-client

First, we start making all required imports

import platform
from IPython.core.display import display, HTML
import pandas as pd

from teselagen.api import DISCOVERClient, DESIGNClient
from teselagen.utils.candidates_to_design import build_design_from_candidates

print(f"python version     : {platform.python_version()}")
print(f"pandas version     : {pd.__version__}")

python version     : 3.6.9
pandas version     : 1.1.4

Look for your Evolution results¶

Here, the concept of closing the DBTL loop refers to the ability to generate designs out of what was learned from previous experiments. Those designs can be used to conduct new experimental rounds. This notebook assumes you've already trained an Evolution model.

The results of an Evolution model contain a set of ranked candidates that may outperform your current measurements. Each of the proposed candidates is a combination of the parts (and possibly other variables) you have already tested within the designs in your experiments. These new combinations were evaluated and ranked by a machine learning algorithm and we will generate proper designs with them.

This guide starts at the output of the Evolutions tool at DISCOVER. The next cell connects the notebook with DISCOVER and selects the empty lab (Common) which holds our sample experiment:

# Connect to your teselagen instance by passing it as the 'host_url' argument of EVOLVECLient(host_url=host_url)
# client = EVOLVEClient(host_url="https://your-instance-name.teselagen.com")
client = DISCOVERClient()
client.login()
client.select_laboratory()

Connection Accepted
Received None lab identifiers
Selected Common Lab

Next, we find the evolutive model with name Teselagen Example Evolutive Model:

search_for_name = "Teselagen Example Evolutive Model"
evolution_models_info = client.get_models_by_type('evolutive')
model_id = -1
for info in evolution_models_info:
    if info['name'] == search_for_name:
        model_id = info['id']
        print(f"Model id {info['id']}, name: {info['name']}")
if model_id == -1:
    raise IOError("Didn't found model")

Model id 65, name: Teselagen Example Evolutive Model

And get the models' results. The results objects contain predictions for several untested combinations. We will focus on the rows with valid priority values, which are the better candidates suggested by the algorithm:

results = client.get_model_datapoints(model_id='65', datapoint_type="output", batch_size=400,batch_number=1)
data = pd.DataFrame([el['datapoint'] for el in results['data']])
data = data.dropna(subset=['priority']).reset_index(drop=True)
display(data)

Note the algorithm doesn't suggest candidates you've already tested. That's why the Production column, the unknown variable for untested combinations in this example, contains only NaN values.

Build the designs json¶

Now we need to generate a json file with the candidates in order to be imported from DESIGN. We've added an utility for this at the api-client library that is called build_design_from_candidates. This utility receives a list of dictionaries as input and it requires to explicitly declare the columns that should be interpreted as bins. Following with the example:

design = build_design_from_candidates(
    candidates_data = data.to_dict(orient="records"),
    bin_cols = ['Teselagen Enzyme A', 'Teselagen Enzyme B'],
    name = "Closing DBTL Example",
    priority_col='priority'
)

Generating design using 10 candidates

The design variable contains a dictionary representation of the design. This representation can be easily stored as a json file and then uploaded into DESIGN. To do this, we need to create a DESIGNClient instance:

design_client = DESIGNClient(host_url = client.host_url)
design_client.select_laboratory()

Received None lab identifiers
Selected Common Lab

And upload the design. The method post_design returns the id of the generated DESIGN in case of success:

response = design_client.post_design(design=design)
display(response)

Connection Accepted

{'id': '1215'}

The new design should be created and look like this:

Uncomment and run the following cell to get the design link:

# design_url = f"{design_client.host_url}/design/client/designs/{response['id']}"
# display(HTML(f"""<a href="{design_url}">{design_url}</a>"""))

	Teselagen Enzyme A	Teselagen Enzyme B	Production	prediction	sigma	acq	in_batch	priority
0	Variant A1	Variant B5	NaN	6.544949	2.851464	0.290311	True	0.0
1	Variant A4	Variant B3	NaN	6.224357	2.517428	0.160054	True	1.0
2	Variant A5	Variant B4	NaN	6.179204	2.372436	0.125915	True	2.0
3	Variant A1	Variant B3	NaN	4.842277	3.142049	0.127976	True	3.0
4	Variant A0	Variant B5	NaN	5.172261	3.389486	0.208031	True	4.0
5	Variant A3	Variant B5	NaN	7.085248	2.010375	0.168389	True	5.0
6	Variant A4	Variant B4	NaN	6.287957	2.238001	0.112734	True	6.0
7	Variant A5	Variant B1	NaN	4.789871	2.664990	0.059986	True	7.0
8	Variant A4	Variant B2	NaN	4.678120	2.237455	0.020480	True	8.0
9	Variant A2	Variant B3	NaN	5.693773	2.384573	0.082872	True	9.0